class: center, middle, inverse, title-slide .title[ #
Data visualization
] .subtitle[ ## Course introduction ] .author[ ###
Marta Coronado Zamora; Jose F. Sánchez-Herrero; Miriam Merenciano
] .date[ ### 12 February 2026 ] --- class: center, middle, animated, bounceInDown <style> .title-slide { background-image: url(img/logo.png); background-size: 250px; } </style> ## Keep in touch #### Theory lessons <br> | Marta Coronado Zamora | Jose F. Sánchez | |:-:|:-:| | <a href="mailto:Marta.coronado@uab.cat"><i class="fa fa-paper-plane fa-fw"></i> marta.coronado@uab.cat</a> | <a href="mailto:JoseFrancisco.Sanchez@uab.cat"><i class="fa fa-paper-plane fa-fw"></i> josefrancisco.sanchez@uab.cat</a> | | <a href="https://bsky.app/profile/geneticament.bsky.social"><i class="fab fa-bluesky fa-fw"></i> @geneticament</a> | <a href="https://twitter.com/JFSanchezBioinf"><i class="fab fa-twitter fa-fw"></i> @JFSanchezBioinf</a> | | <a href="https://portalrecerca.uab.cat/es/organisations/grup-de-gen%C3%B2mica-bioinform%C3%A0tica-i-biologia-evolutiva-gbbe/"><i class="fa fa-map-marker fa-fw"></i> Universitat Autònoma de Barcelona </a> | <a href="http://www.germanstrias.org/technology-services/genomica-bioinformatica/"> <i class="fa fa-map-marker fa-fw"></i>Germans Trias i Pujol Research Institute (IGTP)</a> | #### Practical lessons <br> | Miriam Merenciano | |:-:| | <a href="mailto:miriam.merenciano@uab.cat"><i class="fa fa-paper-plane fa-fw"></i> miriam.merenciano@uab.cat </a> | | <a href="https://portalrecerca.uab.cat/es/organisations/grup-de-gen%C3%B2mica-bioinform%C3%A0tica-i-biologia-evolutiva-gbbe/"><i class="fa fa-map-marker fa-fw"></i> Universitat Autònoma de Barcelona </a> | --- layout: true class: animated, fadeIn --- class: animated, fadeIn # Course overview **P0** | Course Introduction. (<font color=" #A8A8A8">Marta Coronado Zamora & Miriam Merenciano</font>)<br><br> -- **T1** | Introduction. Perception, illusions, inter-individual variability, ranking of visual features and common pitfalls (<font color=" #A8A8A8">Marta Coronado</font>)<br><br> -- **Part 1: Tools for data visualization** (<font color=" #A8A8A8">Marta Coronado Zamora and Miriam Merenciano</font>)<br> **T2** | Basic tools for data visualization (`ggplot2`) - P1, P2, P3 (first assignment)<br> **T3** | Dynamic and interactive (`plotly`, `shiny`) - P4, P5 (second assignment)<br><br> -- **Part 2: Complex data and dimensionality reduction** (<font color=" #A8A8A8">Jose F. Sánchez-Herrero</font>)<br> **T4** | Introduction: Visualization for exploring complex data & Dimensionality reduction - P6<br> **T5** | Principal component analysis - P7<br> **T6** | Non-linear projections: t-SNE - P8 (third assignment)<br> **T7** | Non-linear projections: UMAP P9 (fourth assignment) --- class: animated, fadeIn # Evaluation - **10% participation** Individual submission at the end of each theory/practical session -- - **40% group assignments** (minimum grade 4/10) 4 assignments, each 10%<br><br> -- - **20% mid-term exam** - **30% final exam** The weighted grade of the midterm exam and the final exam requires a minimum score of 3.5 out of 10 to consider the other parts of the evaluation. <br> [Information in the Syllabus](https://www.fib.upc.edu/en/studies/bachelors-degrees/bachelor-degree-bioinformatics/curriculum/syllabus/DV-BBI) --- class: animated, fadeIn # Course configuration -- - **Theoretical sessions**: 12 sessions, Friday: 14 - 16h (A5202) -- <br> - **Practical exercises**: 13 sessions, 2 groups. Thursday: 14 - 18h (A5201) Complete and submit to [ATENeA](https://atenea.upc.edu/course/view.php?id=105474) - individual submission -- <br> - **Home assignments:** Work in groups to create analyze data, create visualization, interactive reports and apps... Complete and submit to [ATENeA](https://atenea.upc.edu/course/view.php?id=95546) - submission by group -- <br> - **Seminars**: Invited seminars either in person or online, live or broadcast. - 27th March | **Clara Inserte**: `clevRvis`: visualization techniques for clonal evolution --- class: animated, fadeIn # Group assignments - __1\.__ Create groups of 3-4 people Make groups of 3-4 students to work on the home assignments projects. - __2\.__ Enroll in ATENeA https://atenea.upc.edu/mod/choicegroup/view.php?id=5311480 --- class: animated, fadeIn # Previous capacities Basic knowledge in R and familiarity with RStudio are prerequisites. -- .pull-left[<center> <img src="data:image/png;base64,#img/descarga.png" width=500/> </center> ] .pull-right[ 1. Go to https://wooclap.com 2. Introduce the event code: **BJMEAL** 3. Login (SSO) with ATENeA ] --- layout: false class: left, bottom, inverse, animated, bounceInDown # Let's get started! ## **Tools for data visualization** --- layout: true class: animated, fadeIn --- class: animated, fadeIn ## Type of tools ### Two main types: -- - Graphical user interface (GUI) <font color="#A8A8A8">Many examples: Perseus computational platform, Cytoscape, Blast2GO, Gephi, ... </font> -- - Code-based <font color="#A8A8A8">R (and other computer languages [Python])</font> - Interactive code-based `R` <font color="#A8A8A8">RStudio (Jupyter Notebook, and other computer languages)</font> -- [Wide range...](https://en.wikipedia.org/wiki/List_of_information_graphics_software) <br> -- <div style="background-color:#FFDA9E"> <b><i class="fas fa-question-circle"></i> Question</b> <br> What prons and cons do you think GUI tools have in comparison to code-based? </div> --- layout: true class: animated, fadeIn --- class: animated, fadeIn ## Slides with Rmarkdown ### **Interactive <i class="fab fa-r-project fa-fw"></i></i> documents** `R code` can be executed within RStudio! -- ``` r value <- 2 value + 3 ``` ``` ## [1] 5 ``` -- .pull-left[ ``` r # ggplot2 boxplot ggplot(data = iris, mapping = aes(x = Species, y = Petal.Length, fill = Species)) + geom_boxplot() ``` ] .pull-right[ <img src="data:image/png;base64,#0_files/figure-html/unnamed-chunk-3-1.png" width="324" style="display: block; margin: auto;" /> ] --- class: animated, fadeIn ## Visualization libraries in `R` - `base` - `grid`: `lattice` and `ggplot2` <center> <img src="data:image/png;base64,#0_files/figure-html/unnamed-chunk-4-1.png" width="324" /><img src="data:image/png;base64,#0_files/figure-html/unnamed-chunk-4-2.png" width="324" /><img src="data:image/png;base64,#0_files/figure-html/unnamed-chunk-4-3.png" width="324" /> <br> -- <div align="left" style="background-color:#FFDA9E"> <b><i class="fas fa-question-circle"></i> Question</b> <br> Describe the graphics. In your opinion, which do you think is the simplest? and the most complex? do you think the code to generate the figures reflect the complexity? </div> --- class: animated, fadeIn ## Visualization libraries in `R` ``` r # base hist(iris$Sepal.Width) # ggplot2 ggplot(iris, aes(Sepal.Width)) + geom_histogram() ``` <center> <img src="data:image/png;base64,#0_files/figure-html/unnamed-chunk-6-1.png" width="324" /><img src="data:image/png;base64,#0_files/figure-html/unnamed-chunk-6-2.png" width="324" /> --- class: animated, fadeIn ## Visualization libraries in `R` .pull-left[ ``` r # base plot(circumference ~ age, data=Orange[Orange$Tree %in% "4", ], type = "l", main = "Base - complex") points(circumference ~ age, col="darkred", data=Orange[Orange$Tree %in% "2", ], type = "l") points(circumference ~ age, col="orange", data=Orange[Orange$Tree %in% "5", ], type = "l") points(circumference ~ age, col="yellow", data=Orange[Orange$Tree %in% "1", ], type = "l") points(circumference ~ age, col="darkgreen", data=Orange[Orange$Tree %in% "3", ], type = "l") legend("topleft", c("4", "2", "5", "1", "3"), title="Tree", col=c("black", "darkred", "darkorange", "yellow", "darkgreen"), lty=c(1, 1, 1, 1, 1)) ``` ``` r # ggplot2 ggplot(Orange, aes(age, circumference, colour = Tree)) + geom_line() + labs(title = "ggplot2 - complex") ``` ] -- .pull-right[ <img src="data:image/png;base64,#0_files/figure-html/unnamed-chunk-9-1.png" width="360" style="display: block; margin: auto;" /> <img src="data:image/png;base64,#0_files/figure-html/unnamed-chunk-10-1.png" width="360" style="display: block; margin: auto;" /> ] --- class: animated, fadeIn ### Other visualization libraries (Outside our scope) - Python + `matplotlib`, `seaborn` + `Bokeh`, `pygal` - Java: `Processing` - Javascript: `D3.js` <center> <img src="data:image/png;base64,#img/190840954-dc243c99-9295-44de-88e9-fafd0f4f7f8a.jpg" alt="Bokeh plots" width=80%/> </center> --- layout: false class: left, bottom, inverse, animated, bounceInDown # Basic `R` knowledge --- layout: true class: animated, fadeIn --- class: animated, fadeIn ### Installing a package ``` r # Download and install a package from CRAN install.packages("ggplot2") # Download and install a package from GitHub(you need the devtools library installed) devtools::install_github("yihui/xaringan") ``` ### Loading a package ``` r # Load the library to the current session library("ggplot2") library("xaringan") ``` ### Loading data ``` r # Loading a tab-separated file with a header data <- read.table("data.txt", header = TRUE, sep = "\t") ``` --- class: animated, fadeIn ## Data types and structures Main data types <font color=" #A8A8A8">(other will not be discussed: _complex_ and _raw_)</font>: - __Logical__: can only take on two values: true (`TRUE`, `T`) or false (`FALSE`, `F`) - __Numeric__: real or decimal (`2`, `15.5`) - __Integer__: `2L` (the `L` tells R to store this as an integer) - __Character__: any type of character or number (`"a"`, `"swc"`, `"2"`) -- <i class="fas fa-info-circle"></i> To know the data type, you can use the `class()` function. ``` r type_list <- list(TRUE, 1.2, 10L, "a") sapply(type_list, class) ``` ``` ## [1] "logical" "numeric" "integer" "character" ``` --- class: animated, fadeIn ## Data types and structures Elements of the previous data types may be combined to form data structures. Main structures: - __Vector__: collection of elements that holds data of a single data type - __Matrix__: vector with dimensions (the number of rows and columns) - __Factor__: to deal with categorical variables - __List__: a special type of vector where each element can be a different type - __Data Frame__ <i class="fas fa-star"></i>: a special type of list where every element of the list has same length -- ``` r # A vector x of mode numeric x <- c(1, 2, 3) # A vector y of mode logical y <- c(TRUE, TRUE, FALSE, FALSE) # A vector z of mode character z <- c("ATG", "GGC", "TGA") ``` --- class: animated, fadeIn ## Data types and structures Elements of the previous data types may be combined to form data structures. Main structures: - __Vector__: collection of elements that holds data of a single data type - __Matrix__: vector with dimensions (the number of rows and columns) - __Factor__: to deal with categorical variables - __List__: a special type of vector where each element can be a different type - __Data Frame__ <i class="fas fa-star"></i>: a special type of list where every element of the list has same length ``` r matrix22 <- matrix( c(1, 2, 3, 4), nrow = 2, ncol = 2) matrix22 ``` ``` ## [,1] [,2] ## [1,] 1 3 ## [2,] 2 4 ``` --- class: animated, fadeIn ## Data types and structures Elements of the previous data types may be combined to form data structures. Main structures: - __Vector__: collection of elements that holds data of a single data type - __Matrix__: vector with dimensions (the number of rows and columns) - __Factor__: to deal with categorical variables - __List__: a special type of vector where each element can be a different type - __Data Frame__ <i class="fas fa-star"></i>: a special type of list where every element of the list has same length ``` r factor_vector <- as.factor(c("rna", "dna", "dna", "rna")) factor_vector ``` ``` ## [1] rna dna dna rna ## Levels: dna rna ``` ``` r str(factor_vector) ``` ``` ## Factor w/ 2 levels "dna","rna": 2 1 1 2 ``` --- class: animated, fadeIn ## Data types and structures Elements of the previous data types may be combined to form data structures. Main structures: - __Vector__: collection of elements that holds data of a single data type - __Matrix__: vector with dimensions (the number of rows and columns) - __Factor__: to deal with categorical variables - __List__: a special type of vector where each element can be a different type - __Data Frame__ <i class="fas fa-star"></i>: a special type of list where every element of the list has same length ``` r x <- list(1, "a", TRUE, 1+4i) x ``` ``` ## [[1]] ## [1] 1 ## ## [[2]] ## [1] "a" ## ## [[3]] ## [1] TRUE ## ## [[4]] ## [1] 1+4i ``` --- class: animated, fadeIn ## Data types and structures Elements of the previous data types may be combined to form data structures. Main structures: - __Vector__: collection of elements that holds data of a single data type - __Matrix__: vector with dimensions (the number of rows and columns) - __Factor__: to deal with categorical variables - __List__: a special type of vector where each element can be a different type - __Data Frame__ <i class="fas fa-star"></i>: a special type of list where every element of the list has same length ``` r dat <- data.frame(id = letters[1:10], x = 1:10, y = 11:20) dat ``` ``` ## id x y ## 1 a 1 11 ## 2 b 2 12 ## 3 c 3 13 ## 4 d 4 14 ## 5 e 5 15 ## 6 f 6 16 ## 7 g 7 17 ## 8 h 8 18 ## 9 i 9 19 ## 10 j 10 20 ``` --- class: animated, fadeIn # Additional resources - Other subjects within this degree - DataCamp courses: https://www.datacamp.com/ - Coursera: https://www.coursera.org/ - W3Schools: https://www.w3schools.com/r/ <center> <img src="data:image/png;base64,#img/W3Schools_logo.svg.png" /> </center> --- class: animated, fadeIn ### Tidy data Data frames with one observation per row and one variable per column. <center> <img src="data:image/png;base64,#img/tidy-1.png" alt="Bokeh plots" width="900"/> </center> --- class: animated, fadeIn ### Tidy data Two types of ordered data structures: - **Wide format** (the most common): in a wide format, multiple measurements of a single observation are stored in a single row. ``` ## Student Math Literature PE ## 1 A 99 45 56 ## 2 B 73 78 55 ## 3 C 12 96 57 ``` -- class: animated, fadeIn - **Long format**: each row corresponds to one measurement of an observation. ``` ## Warning: package 'readr' was built under R version 4.5.2 ``` ``` ## Warning: package 'purrr' was built under R version 4.5.2 ``` ``` ## Warning: package 'stringr' was built under R version 4.5.2 ``` ``` ## # A tibble: 9 × 3 ## Student Subject Score ## <chr> <chr> <dbl> ## 1 A Math 99 ## 2 A Literature 45 ## 3 A PE 56 ## 4 B Math 73 ## 5 B Literature 78 ## 6 B PE 55 ## 7 C Math 12 ## 8 C Literature 96 ## 9 C PE 57 ``` --- class: animated, fadeIn ### Tidy data There are functions to convert from wide format to long format: ``` r library(tidyr) long_df <- pivot_longer( wide_df, cols = c(Math, Literature, PE), # o cols = -c(Student), names_to = "Subject", values_to = "Score" ) long_df ``` ``` ## # A tibble: 9 × 3 ## Student Subject Score ## <chr> <chr> <dbl> ## 1 A Math 99 ## 2 A Literature 45 ## 3 A PE 56 ## 4 B Math 73 ## 5 B Literature 78 ## 6 B PE 55 ## 7 C Math 12 ## 8 C Literature 96 ## 9 C PE 57 ``` --- class: animated, fadeIn # Getting help <i class="fas fa-question-circle"></i> - `?read.table`, `?str`, `?as.factor` - Press F1 (in RStudio) - [Stack Overflow](https://stackoverflow.com) ([`R`](https://stackoverflow.com/questions/tagged/r), [`ggplot2`](https://stackoverflow.com/questions/tagged/ggplot2)) - ChatGPT - Ask your classmates or your teacher --- layout: false class: left, bottom, inverse, animated, bounceInDown # Today's task --- class: animated, fadeIn # Today's task 1. Make sure **RStudio** is working 2. Make groups of 3-4 people and fill the form at ATENeA 3. Install some packages you will need for this practical sessions: + `ggplot2` + `tidyr` + `shiny` + `plotly` 4. Work in the following exercises --- class: animated, fadeIn ## Exercise: create some testing plots Execute the following chunks using the [`iris`](https://en.wikipedia.org/wiki/Iris_flower_data_set) dataset and think what is going on: ``` r library(ggplot2) head(iris) str(iris) ``` --- class: animated, fadeIn ## Exercise: create some testing plots <i class="fa fa-question-circle fa-fw"></i> What figure does the following command generate? ``` r ggplot(data = iris, mapping = aes(x = Species, y = Petal.Length, fill = Species)) + geom_boxplot() ``` <br> -- Here we can see the distribution of the variable petal length regarding the species it belongs --- class: animated, fadeIn ## Exercise: create some testing plots <i class="fa fa-question-circle fa-fw"></i> What figure does the following command generate? ``` r ggplot(data=iris,aes(x=Sepal.Width, y=Sepal.Length, color=Species)) + geom_point() + theme_minimal() ``` <br> -- Here we can see using a **scatter plot**, the variable sepal width and sepal length and coloured regarding the species it belongs --- class: animated, fadeIn ## Exercise: describe a data set Read the file in this [link](https://raw.githubusercontent.com/marta-coronado/data_visualization/refs/heads/main/P/0/data/sample.txt), ensure it has a tidy format; indicate the data type of each variable; convert to long format. <i class="fa fa-key fa-fw"></i> Which column(s) will you use in the `cols` argument of the `pivot_longer` function? <i class="fa fa-key fa-fw"></i> You can read directly a file from a link `read.table("https://...txt")` Solutions in the next slides 😇 --- class: animated, fadeIn ## Exercise: describe a data set Read the file in this [link](https://raw.githubusercontent.com/marta-coronado/data_visualization/refs/heads/main/P/0/data/sample.txt), ensure it has a tidy format; indicate the data type of each variable; convert to long format. ``` r data <- read.table(file="https://raw.githubusercontent.com/marta-coronado/data_visualization/refs/heads/main/P/0/data/sample.txt", header=T) head(data) ``` ``` ## ID Group Width Height Depth ## 1 ID001 B 6.518109 41.16221 4.637743 ## 2 ID002 B 6.505138 44.10542 4.822542 ## 3 ID003 A 11.428349 45.03662 5.149416 ## 4 ID004 A 13.714852 58.60545 5.834809 ## 5 ID005 B 8.802453 47.40527 4.546300 ## 6 ID006 B 10.038084 58.58873 5.981818 ``` ``` r dim(data) ``` ``` ## [1] 100 5 ``` --- class: animated, fadeIn ## Exercise: describe a data set Indicate the data type of each variable. ``` r sapply(data, class) ``` ``` ## ID Group Width Height Depth ## "character" "character" "numeric" "numeric" "numeric" ``` ``` r str(data) ``` ``` ## 'data.frame': 100 obs. of 5 variables: ## $ ID : chr "ID001" "ID002" "ID003" "ID004" ... ## $ Group : chr "B" "B" "A" "A" ... ## $ Width : num 6.52 6.51 11.43 13.71 8.8 ... ## $ Height: num 41.2 44.1 45 58.6 47.4 ... ## $ Depth : num 4.64 4.82 5.15 5.83 4.55 ... ``` --- class: animated, fadeIn ## Exercise: describe a data set Convert from wide to long format using `tidyr::pivot_longer()` ``` r library(tidyr) long_df <- pivot_longer( data, cols = -c("ID", "Group"), names_to = "metric", values_to = "value" ) ## check dimensions dim(long_df) ``` ``` ## [1] 300 4 ``` ``` r dim(data) ``` ``` ## [1] 100 5 ``` --- class: animated, fadeIn ## Exercise: describe a data set Convert from wide to long format using `tidyr::pivot_longer()` ``` r library(tidyr) long_df <- pivot_longer( data, cols = -c("ID", "Group"), names_to = "metric", values_to = "value" ) head(long_df) ``` ``` ## # A tibble: 6 × 4 ## ID Group metric value ## <chr> <chr> <chr> <dbl> ## 1 ID001 B Width 6.52 ## 2 ID001 B Height 41.2 ## 3 ID001 B Depth 4.64 ## 4 ID002 B Width 6.51 ## 5 ID002 B Height 44.1 ## 6 ID002 B Depth 4.82 ``` --- class: animated, fadeIn ## See you next day ### Keep in touch #### Theory lessons <br> | Marta Coronado Zamora | Jose F. Sánchez | |:-:|:-:| | <a href="mailto:Marta.coronado@uab.cat"><i class="fa fa-paper-plane fa-fw"></i> marta.coronado@uab.cat</a> | <a href="mailto:JoseFrancisco.Sanchez@uab.cat"><i class="fa fa-paper-plane fa-fw"></i> josefrancisco.sanchez@uab.cat</a> | | <a href="https://bsky.app/profile/geneticament.bsky.social"><i class="fab fa-bluesky fa-fw"></i> @geneticament</a> | <a href="https://twitter.com/JFSanchezBioinf"><i class="fab fa-twitter fa-fw"></i> @JFSanchezBioinf</a> | | <a href="https://portalrecerca.uab.cat/es/organisations/grup-de-gen%C3%B2mica-bioinform%C3%A0tica-i-biologia-evolutiva-gbbe/"><i class="fa fa-map-marker fa-fw"></i> Universitat Autònoma de Barcelona </a> | <a href="http://www.germanstrias.org/technology-services/genomica-bioinformatica/"> <i class="fa fa-map-marker fa-fw"></i>Germans Trias i Pujol Research Institute (IGTP)</a> | #### Practical lessons <br> | Miriam Merenciano | |:-:| | <a href="mailto:miriam.merenciano@uab.cat"><i class="fa fa-paper-plane fa-fw"></i> miriam.merenciano@uab.cat </a> | | <a href="https://portalrecerca.uab.cat/es/organisations/grup-de-gen%C3%B2mica-bioinform%C3%A0tica-i-biologia-evolutiva-gbbe/"><i class="fa fa-map-marker fa-fw"></i> Universitat Autònoma de Barcelona </a> |